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Abstract. In recent years, large high-dimensional data sets have become com- 
monplace in a wide range of applications in science and commerce. Techniques 
for dimension reduction are of primary concern in statistical analysis. Projection 
methods play an important role. We investigate the use of projection algorithms 
that exploit properties of the a-stable distributions. We show that la distances 
and quasi-distances can be recovered from random projections with full statistical 
efficiency by L-estimation. The computational requirements of our algorithm are 
modest; after a once-and-for-all calculation to determine an array of length k, the 
algorithm runs in 0{k) time for each distance, where k is the reduced dimension of 
the projection. 
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1 Introduction 

Let y be a collection of n points in m-dimensional Euclidean space, R™, 
where the dimension m is large, of the order of hundreds or thousands. We are 
interested in distance-preserving dimension reduction via random projections, 
where the points in V are randomly projected onto a lower fc-dimensional 
space such that pairwise distances between original points are well preserved 
with high accuracy. Statistical analyses based on pairwise distances between 
points in V can be performed on the set of projected points, thus reduc- 
ing the computational cost of computing all pairwise distances from 0(n^m) 
to 0{nmk -\- n?k). Important applications of distance-preserving dimension 
reduction are approximate clustering in high dimensional spaces and compu- 
tations over streaming data, for example Hamming distance approximations. 

We consider the problem of preserving la distances (quasi-distances) de- 
fined by da{u,v) = \ui - for (ui, . . .,Um) and {vi, . . .,Vm) e 
, for a e (0,2]. We remark that [daiujv)]^^" is a distance measure for 
a > 1, but not for a < 1, and that the Hamming distance is obtained as 
limQ^o da{u,v). 

In the case a = 2, the lemma of Johnson and Lindenstrauss (1984) demon- 
strates the existence of a projection map pa : M"* i-^- such that 

(1 - e)da{u,v) < da{paiu),Pa{v)) < (1 + e)daiu,v) \fu,v G V, (1) 
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provided that k > ko — 0{\ogn/e^). 

We are interested in dimension reduction in la, for general a € (0,2], 
using stable random projections. See Indyk (2006) for an introduction to 
this technique. The goal will be to satisfy the inequality in ([1]) with high 
probability. In Section [2] we define stable random projections, and show that 
distance preserving dimension reduction in reduces to estimation of the 
scale parameter of the symmetric, strictly stable law, where the latter is 
discussed in Section [3l In Section |4] we present an asymptotically efficient 
estimator of the scale parameter, followed by numerical results in Section O 

2 Random projections 

A random variable X with distribution F is said to be strictly stable if for 
every n > 0, and independent variables Xi, . . . ,X„ ~ there exist con- 

T> 

stants a„ > such that Xi + . . . + Xn — a„X, where V denotes equality 
in distribution. The only possible norming constants are a„ — ri^/", where 
< a < 2; the parameter a is known as the index of stability (Feller, 1971). 
The densities of stable distributions are not available in closed form, except 
in a few cases: Cauchy(a — 1), Normal(Q: ~ 2) and Levy(a — 0.5). 

We are interested in symmetric, strictly stable random variables of index 
a and parameter 9 > 0, with characteristic function Eexp{itX) = e~^'*' , 
defined for t real. Let f{x; a, 0) and F{x] a, 0) be the density and distribution 
function of X. Of particular interest is the following property. Suppose that 
Xi, . . . , Xm are independent variables with distribution function F{x] a, 1) 
and that mi, . . . , Um are real constants, then "YlT^i "^i-^i ^ F{^'i ^) where 
9 — X^iLi I'^il"- If ''^ii ■ ■ ■ T^rn is another sequence of real constants, then it 
follows that 'Y^™^i{ui — Vi)Xi ~ F(x] a, 9) with 9 — da{u, v). 

We assume that the data V is arranged into a matrix V with n rows and 
m columns, i.e. one row for each of the n data points. Let X G M'"^*' be 
a matrix whose entries are independent symmetric, strictly stable random 
variables with index a, and 9=1 for fixed < a < 2. We term X a random 
projection matrix mapping from R™ to R'^ via the map V i— > VX. 

Let B = VX and consider u and u, the ith and jth rows of V, i ^ 
j, corresponding to the ith and jth data points in V. Let a and b be the 
corresponding rows of B. Then, for z = 1, . . . , A:, we have 



az ^ bz = y {ui — vi)Xiz ~ F{x; a, dij), independently for z = 1, . . . , fc. 



where dij = da{u, v). Our aim is to recover da{u, v) from (a, b). Since {a^ — bz '■ 
z = 1, . . . ,k} provides a sample of values from a distribution with parameter 
doiiu,v) we are in a position to apply the usual repertoire of statistical es- 
timation techniques to obtain estimators with specified accuracy. This is of 
particular relevance in the context of streaming data, where da, for a < 1, 



m 
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is a meaningful measure of the pairwise distance between streams; in the 
extreme case of a — s- 0, da tends to the Hamming distance, the number of 
mismatches between two sequences. When a € [1, 2], the la distance is given 
by dl/°' with potential interest for clustering in high dimensional spaces. In 
the case a & [1,2] the statistical problem reduces to estimating the standard 
scale parameter of the symmetric, strictly stable law. 

3 Estimation of the scale parameter 

The problem of parameter estimation of the stable law is particularly chal- 
lenging due to the fact that the density function does not exist in closed form 
for most values of a G (0,2]. The cases a — 1 and a — 2 have been ex- 
tensively studied. See for example (Li et al., 2007) for references. Maximum 
likelihood estimation of the parameters was first attempted in DuMouchel 
(1973) who showed that the MLE's are both consistent and asymptotically 
normal, and computed estimates of the asymptotic standard deviations and 
correlations. Matsui and Takemura (2006) improved upon these estimates by 
providing accurate approximations to the first and second derivatives of the 
stable densities. Nolan (2001) proposes an iterative approach to maximum 
likelihood estimation of the parameters, implemented in his software package 
STABLE, available at |http : / / www. robust analysis .com/ 

We compute approximations to the second derivative of the stable density 
and the logarithm of a transformed density by a second order finite difference 
scheme with grid width h — 0.01 using the integral form of the density func- 
tion given in Nolan (2007), as implemented in the contributed package fBasics 
to R; Figure [T] displays the approximations. We obtained similar estimates 
using the expressions in Matsui and Takemura (2006). 

Among the first estimators of the scale parameter are those of Fama and 
Roll (1968) based on sample quantiles, for a > 1. The known form of the 
characteristic function of the stable law has proved to be a useful tool for 
parameter estimation (Kogon and Williams, 1998). More recently, Li (2008) 
proposes the harmonic mean estimator for a < 0.344 and the geometric mean 
estimator for 0.344 < a < 2 to estimate 9; combined, these estimators have 
an asymptotic relative efficiency exceeding 70% and increasing to 100% as 
a 0. Furthermore, Li and Hastie (2008) propose a unified estimator based 
on fractional powers with ARE no smaller than 75%, out-performing the 
combined harmonic and geometric mean estimators, and with good small 
sample performance for values of k as small as 10; we point out that the 
fractional power estimator has been proposed previously in Nikias and Shao 
(1995). Our approach is to use L-estimation to estimate the logarithm of 
the scale parameter. We will show that the method is simple and practical, 
involving only a precalculated table and then a subsequent sum of products 
to achieve asymptotic efficiency of 100%. 
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Fig. 1. Approximations to the second derivative of f{x; a, 1) for a € [0.1, 2]. 

4 The approach of L-estimation 

Consider a random sample xi,...,Xk ^ f{x; a, 6) and let 7 = Define 

Ui := log \xi\ = ^ + Zi, i = l,...,k, 

where Zi is distributed as the logarithm of the absolute value of a symmetric, 
strictly stable random variable of index a and 6=1, and /z = log 7. Let fo{z) 
and Fq(z) denote the p.d.f. and distribution function of Zi, respectively. So, 
(2/1, • • • , Vk) is a random sample of variables with p.d.f. fo{y — /u), where 

fo{z) = 2e^f{e^; a, 9), -00 < z < 00. 

The problem reduces to that of estimating the location parameter /i for 
the family of distributions {/o(y — /u), /z G IR.}, based on a random sample 
from fo{y- n). 

The method of L-estimation defines the estimate /t as a weighted linear 
combination of order statistics y(i), ■ ■ ■ , y(k) - Chernoff et al. (1967) prove that 
when the weights are suitably chosen , (/i — ]E(/t)) is asymptotically nor- 
mal with mean and variance Consequently the estimator (i is asymp- 
totically efficient. 
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In large samples, the weights can be approximated by 

where — log/o(y). Furthermore, the systematic bias-correction term is 
given by 

1 r°° 

BC = E(/i) - fi= / ze"{z)fo{z)dz, 

hi J -oo 

so, the corresponding bias-corrected estimator is fiBC = X]i=i f^iky{i) — BC. 

Table [T] gives the Fisher information and the bias for various values of a, 
obtained numerically by making use of approximations to the stable densities 
and quantiles in the R package fBasics. The values of Fisher information agree 
with those presented by Matsui and Takemura (2006) to within 3-4 significant 
digits for a G (0.3, 1.8), but appear to be slightly different for a outside this 
range; for example, for a = 1.8, our estimate is 1.3920, whereas that of Matsui 
and Takemura (2006) is 1.3898. 
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BC 
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BC 


0.14 


0.0183 


-1.5253 


0.6 


0.2325 


-0.4380 


1.1 


0.5774 


0.0762 


1.6 


1.0780 


0.4183 


0.15 


0.0210 


-1.4522 


0.65 


0.2626 


-0.3658 


1.15 


0.6182 


0.1119 


1.65 


1.1459 


0.4497 


0.2 


0.0363 


-1.1956 


0.7 


0.2937 


-0.2995 


1.2 


0.6604 


0.1466 


1.7 


1.2198 


0.4741 


0.25 


0.0547 


-1.0420 


0.75 


0.3256 


-0.2388 


1.25 


0.7042 


0.1804 


1.75 


1.3011 


0.4874 


0.3 


0.0755 


-0.9331 


0.8 


0.3585 


-0.1834 


1.3 


0.7499 


0.2138 


1.8 


1.3920 


0.4875 


0.35 


0.982 


-0.8438 


0.85 


0.3924 


-0.1324 


1.35 


0.7976 


0.2470 


1.85 


1.4968 


0.4743 


0.4 


0.1226 


-0.7611 


0.9 


0.4272 


-0.0852 


1.4 


0.8476 


0.2804 


1.9 


1.6270 


0.4480 


0.45 


0.1483 


-0.6790 


0.95 


0.4631 


-0.0412 


1.45 


0.9002 


0.3142 


1.95 


1.7882 


0.4122 


0.5 


0.1753 


-0.5965 


1.0 


0.5 





1.5 


0.9558 


0.3487 


1.99 


1.8861 


0.3912 


0.55 


0.2034 


-0.5154 


1.05 


0.5379 


0.0390 


1.55 


1.0148 


0.3838 


2.0 


2.0 


0.3687 



Table 1. Fisher information If^ for the parameter and the systematic bias (BC) 
in estimating ^ by efficient L-estimation, tabulated for values of a G [0.14, 2]. 



In the case a > 1 we will be interested in estimating 7 = e^, corre- 
sponding to the la norm. We propose the estimator 7 = exp {fiBc)- It follows 
that Vk{^ — 7) is asymptotically normal with mean and variance I//7, 
where 1^ is the Fisher information about the scale parameter 7 contained in 
(xi, . . . , Xk), or equivalently (j/i, . . . , yk)- By second order Taylor expansion, 
we show that the bias incurred by exponentiating is approximately 

E(7) « 7 + ^7! {fiBc 

so the bias-corrected estimator '^bc 
of order 0{l/k^). 



= 7^1 — 2r/~) unbiased up to terms 
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Fig. 2. This plot displays approximate weights Wik for t := -^r^ £ (0.01, 0.99) and, 
starting from the left, following the peaks, a = 0.15, 0.3, 0.5, 0.8, 1.0, 1.2, 1.5, 1.8, 2.0. 



In practice, we use the following approximation for the weights in ^ 



normalised to sum to 1; Figure [2] displays the weights for various values of a. 
For a small, the weighted sum in the formulation of the L-estimator places 
significant weight on the small order statistics, and negligible weight on the 
large order statistics, gradually shifting the weight balance towards large 
order statistics as a — > 2. The bias-corrected estimator of 7 is computed as 
follows: 



^Bc = exp I ^ (y(i) -^o^^-j 



1 



Similar calculations provide an asymptotically efficient estimator for 9; a 
more relevant parameter for values of a less than 1 . 
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Fig. 3. Comparison in terms of mean square error (m.s.e.) of the L-estimator of 
with the fractional power estimator of Li and Hastie (2008) (10^ replicates). 
The Cramer-Rao lower bound is plotted for comparison. The equivalent plot for 
estimators of 7 = 6^^" shows a similar pattern. The perturbation in the m.s.e. for 
the L-estimator at a = 1.9 is caused by an oscillation in the weight function; it can 
be minimised by selective trimming. 



5 Numerical results 

The L-estimator is easily computable as the weights depend only on a and 
fc, and can be tabulated once-and-or-all for any required value of a. The cal- 
culation of these terms depends on accurate approximations to the quantiles 
and the density of the symmetric, strictly stable distribution. Whereas it is 
possible to obtain a good approximation to the MLE via an iterative proce- 
dure with a suitably large table of pre-calculated derivatives for fixed a, the 
L-estimation procedure has the advantage of achieving the same asymptotic 
performance without iteration. The L-estimator has modest computing re- 
quirements; it has 0{k) running time and 0{k) storage requirement given a 
table of pre-calculated weights for given a. 

To confirm the superior performance of out L-estimator we have simulated 
its mean square error for various sample size and various values of a. Figure [3] 
shows that, as expected, the L-estimator has smaller mean square error than 
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the estimator of Li and Hastie (2008). The perturbations in the m.s.e. of the 
L-estimator at a = 1.9 are caused by an osciUation of the weight function 
which becomes negative when is close to 1 (see Figure [2|) . The effect can 
be minimised by using a trimmed version of the L-estimator. This is work in 
progress and will be reported elsewhere. 
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